perm filename TOHOKU[S87,JMC] blob sn#843858 filedate 1987-08-02 generic text, type C, neo UTF8
COMMENT ⊗   VALID 00006 PAGES
C REC  PAGE   DESCRIPTION
C00001 00001
C00002 00002	\input memo.tex[let,jmc]
C00003 00003	This lecture is on giving computers common sense.
C00037 00004
C00050 00005	\section{Actions and the situation calculus}
C00074 00006	\smallskip\centerline{Copyright \copyright\ \number\year\ by John McCarthy}
C00075 ENDMK
C⊗;
\input memo.tex[let,jmc]
\title{Giving Computers Common Sense}
\centerline{A lecture given at Tohoku University, Sendai, Japan}
This lecture is on giving computers common sense.
I have been interested in this problem since the 1950s,
and gave a paper on it in 1958, when my hair was as black as yours.
I still haven't quite solved it.          
People keep trying,
but it seems to be a very difficult scientific problem  
to understand what is involved in common sense
well enough to make computers carry it out.

Immediately we come to a controversy, because
there are two ways of looking at common sense.
One way is to say,
``Common sense is a human property,
and therefore understanding it is a branch of psychology.''
The other way to look at it is to say that common sense refers to
certain kinds of information situations,
to the ability to achieve goals with certain information 
and lacking other information.
In this case it's a branch of computer science rather than a branch of 
psychology.

The problem is big enough so that there is room for everybody
to try his own way,
but let me say that I take the second view
and regard the common sense  problem in particular,
and artificial intelligence in general,
as a problem of computer science --- as
devising means whereby with certain kinds of information
one can solve problems.

Already in 1958 it was clear that 
it was not going to be so easy to define common sense 
or to define intelligence.
However, I made the statement in the 1958 paper, ``Programs with Common
Sense'',
that a program has common sense if it has sufficiently 
obvious consequences of what it is told and its general knowledge.
This gives rise to some questions:
namely what general knowledge does a person have 
that is available for use in solving problems
and what are obvious consequences.

Let me discuss a little bit about what is obvious and what is not.
I'll tell you a little story.
A few weeks ago,
I wanted to go to a meeting and I was late to this meeting.
I drove my car to the parking lot,
and I was carrying a rather heavy attache case.
The question was how to get to this meeting on time.
Driving the car there was not very attractive, because
there probably was not going to be any place to park.
There were several obvious possibilities:
one was to walk there with the heavy case.
Another one was to go to my office,
which was on the way, and leave the case there,
and then walk to the place of the meeting.  Perhaps I could use my
bicycle that was on the way from the car to the meeting place
and near my office.
I said,
``Well, I don't want to carry the case on the bicycle,
I am not good at that.
So maybe I'll have to walk to my office
and then walk to the bicycle''.
Then the solution came.

I claim that even though the solution is easy,
and any of you might get it, it's still not obvious.
The solution was to walk to the bicycle locker,
put the case into the bicycle locker and take the bicycle.

	Now what I want to argue is that the solution is not obvious.
Moreover, if you make a computer program with common sense,
then you will have to make it so that that solution is not obvious, i.e.
not one that the program will reach during the first stage of drawing
conclusions from the data.
The bicycle locker is a tool,
and the way in which a system that has common sense should treat a tool
is as an auxiliary to accomplishing the tool's standard purpose.
The usual purpose of the bicycle locker is to store the bicycle,
not to store something else.

Thus the point of this little story is that 
when we want to design a system that has common sense,
we don't want it to explore all the possible actions
that it has available to it,
because while the action of
using the bicycle locker to store the case, was in fact correct,
there were many other possibilities of actions that could be performed,
like standing on the case,
or standing on the bicycle locker, or carrying the bicycle.
All those make no sense, but they would conceivably be explored in
a system that is not restrictive enough in its rules for generating
actions whose consequences are to be explored.

	That digression actually probably should have been later
in the lecture, but I wanted to make it while I thought about it.
Something that came along fairly early in thinking about common sense
was trying to use mathematical logic as the language in which the computer     
program would represent what it knew about the world, 
both its general knowledge
about the world and the knowledge of a particular situation.
Now this is not an inevitable decision.
In fact it's a controversial idea.
There has been controversy, the declarative versus procedural controversy,
as to whether knowledge should be represented by computer program or whether
it should be represented by facts in the computer memory.
In the long run, the answer is both,
and even to represent the same information in both ways.
However, today's AI programs are not able to make substantial
declarative use of information that is also procedurally represented.

But it is interesting that when one person is instructing another person
on how to do something, almost all of the information that is
transmitted is declarative, i.e. facts about the situation.
What you tell the other person does not correspond to telling him
a computer program.
For communication, declarative sentences have very important 
advantages, the largest of which is that it is not necessary 
to understand another person's internal state, or internal workings 
in order to tell him facts, whereas if you imagine that he had a 
program and you wanted to debug his program, this would be like
trying to do education by brain surgery, which fortunately we 
don't have to do.

The idea of using a formal logic to describe facts about the common sense
world and to do common sense reasoning is old; it goes back to Leibniz, who
wanted it to be able to replace argument in controversies
over human affairs by calculation.
Leibniz did not get very far with his project.
He did not even invent propositional calculus, and that requires 
some remark, because propositional calculus is much simpler than 
the infinitesimal calculus of which Leibniz was a co-inventor with Newton.

Thus he solved what was technically a harder problem and didn't solve 
the easy problem. The reason presumably is that the concepts were 
somewhat confusing, and I think the the concepts are still
somewhat confusing, although we have made a lot of progress.
Probably fifty years from now or maybe a hundred years from now,
people will look back and say, ``How could these people in the 1980s 
have missed what is so obvious to us.''
Leibniz had the idea of 0 and 1 as truth 
values,
but he missed the fact that the operations on these truth values should be 
quite different than the arithmetic operations in the binary number system.

Boole, who did develop the propositional calculus, called his 
book {\it The Laws of Thought}, and he wanted to apply it to reasoning in
general.  Frege had some such ideas also, but after Frege,
it seems that mostly the mathematical logicians said,
``Well, we can do the foundations of mathematics'', and had great
success in applying mathematical logic to the foundations of mathematics,
and this success in one area caused them to neglect trying to apply it 
to common sense reasoning.  All that may be just to say that the subject
turned out to be a difficult one, and it might take quite a long time.

Now, as to what people have been trying to do in representing the facts
about the common sense world, I would say there are really two things.
One of them can be described as {\it is-a} hierarchies  as in taxonomy
where you have it that a dog is an animal and a St. Bernard is a 
a dog, and one considers the properties that are inherited down this 
hierarchy and inheritance with exceptions.  That has certainly had quite 
a lot of use in AI.  If one looks at systems, for example LISP-like flavors,
the purpose of this is mainly to handle inheritance, and whether it's
to be done by logic or some other way is an interesting question.  Anyway
there are lots of ways of doing inheritance.

The second area in which there has been a lot of work is in formalisms
that describe the consequences of actions.  The most straightforward
way to treat describing the consequences of actions is to write a program
that will simulate the consequences of actions.  If you have clearly 
defined situations, if you have complete information about the effects of
an action, or are willing to use a model which assumes complete
information about the effects of an action, then this works pretty well.
However, it doesn't work so well if you only have partial information about 
a situation and only partial information about the effects of action.
You can then no longer simulate because all you can determine are some facts 
about the effects of the actions, some facts about the situation which 
results from the action rather than the complete state of affairs.

But apart from simulation, there are many other things that you want to
know about the effects of actions.  For example, you may want to know that 
no sequence of actions from a certain set will achieve a goal, even though
an exhaustive search of the possible actions is impossible.
You may know, for example, that all the actions that can be performed 
leave a certain quantity constant and therefore
no sequence of actions will change this quantity.
To take a trivial example, if I wanted to be on the ceiling 
of this room, we know that no sequence of steps of walking will achieve that
result, because if I'm on the floor, any step of walking
keeps me on the floor.  We don't need to explore the consequences of
individual sequences of steps.

We also use facts about the consequences of actions
to reason about the past.  If I see something, then I can infer something
about the past.  Someone brought in this projector; probably it wasn't here
last night.
Thus we need more facts about the consequences of actions than can be obtained
by running a simulation program.

In 1969, Pat Hayes and I wrote a paper including something called the
{\it situation calculus}, which is a particularly simple formalism for treating
the consequences of actions.  It limits itself to actions that have
discrete results.  It doesn't treat a continuous action.  It doesn't treat
concurrent actions, that is, when several different people are at the same
time doing things.  Some things can be formalized reasonably in the
situation calculus.  Moreover, some of the more applied AI systems use
formalisms that are in some sense derivative from the situation calculus.

\section{Non-monotonic Reasoning}

In the late 1970s, there was progress in a new direction.  It was
discovered that the normal modes of reasoning in mathematical logic 
needs to be supplemented by non-monotonic reasoning.
It's interesting that the first person to raise this issue in print
may have been Minsky.  However, he raised it in a negative context;
namely, he was against the use of logic.  Minsky in general doesn't like
the use of logic in AI, even though some of his best students have used
it.  So, he said something like ``Logic can't do non-monotonic reasoning,''
and he gave the example of ``What are you going to put in your database
about birds flying?  Are you going to put in all the exceptions, that dead
birds can't fly and birds with their feet in concrete can't fly, and so
forth?''  And no matter how many things you put in your conditions, 
Minsky was ready to invent more things than you had taken into account.

Around the same time, some other people, myself
included, were coming to similar problems, but 
from a positive point of view. Namely, 
``Let us add some
methods of reasonings to mathematical logic that will 
make it work for non-montonic reasoning.''
 Let me put down a formula that has to do 
with this question.
The logicians use the following notation:
%
$$A\vdash p,$$
%
is read ``$A$ infers $p$'' and means that if $A$ is some set of sentences,
and $p$ is a sentence, then there is a proof of $p$ from $A$.
So, given the facts, $A$ being a collection of facts, 
you can prove $p$ from that collection of facts. Now 
suppose that $A$ is contained in $B$, that $B$ is some larger
collection of facts that includes all of the facts in $A$.
Then it is going to be true, in just about any mathematical
logical system that has an ordinary notion of proof, that
$p$ is provable from $B$.  The reason is trivial; namely if you have
a proof of $p$ from $A$, then exactly the same proof will work 
as a proof of $p$ from $B$.

	The logicians summarize this using the schema
%
$$A\vdash p$$
$$A ⊂ B$$
\centerline{\vbox{\hrule width3cm}}
$$B\vdash p.$$
%
	However, ordinary human reasoning is often
non-monotonic in the sense of the following
example. Suppose that you know that I own a car and that 
we are going somewhere, then you might think that perhaps
I will give you a ride. But when you learn that my car is in
California, then you no longer reach that conclusion. So, an
increase in facts causes a reduction in the conclusions.
Now, what we need to investigate there is not so much why you
took it back, why you no longer make the conclusion, but how 
did you reach the conclusion in the first place
if it was so readily retracted?  The way 
you reached the conclusion in the first place is a matter of 
argument, but let me tell you my view of it. In the more or
less standard model, or standard view of what you can expect
when someone owns a car, is that the car is accessible to him,
and therefore he can give someone a ride in the car.
And we make this reasoning from the standard model all the time.
Now, the question is how do we formalize this logically.
Several ways have been proposed.

Three well-known ways were proposed in three papers that appeared 
in the magazine {\it Artificial Intelligence} in 1980.
One of them was called ``Non-monotonic Logic'', by 
Drew McDermott and John Doyle, and another one was called  ``A 
Logic of Defaults'', by Raymond Reiter,
and the third one was called ``Circumscription'', by me.
I'm going to talk about circumscription as a means of non-monotonic reasoning.

Circumscription has had a number of different forms 
because, after it has been proposed in one form,
applying it to new common sense reasoning problems,
leads to the discovery
that it needs to be changed and generalized.
This process has not concluded yet.

Here is one of the more straight-forward forms of circumscription.
First we introduce a notion of ordering of predicates.
Let $P$ and $P'$ be predicates on the same domain.
We define
%
$$P ≤ P' ≡ (∀x)(P(x) ⊃ P'(x)).$$
%
What the $≤$ amounts to is that $P$ is, 
so to speak, no more true than $P'$, i.e.
it's true for a subset of $x$'s.


Thus what we want to talk about is a notion of minimizing the truth of $P$,
subject to some condition.  Of course if we don't have any conditions, 
we can minimize the truth of $P$ by making it always false.  Therefore,
we need to talk about constrained minimization.  Here we have
the formula called {\it circumscription}:
%
$$A(P,Z) ∧ ∀P'Z'.¬(A(P',Z') ∧ P' < P).$$
%
Here $A$ is the condition that has to 
be satisfied.  $P$ is what you want to minimize, $Z$ is what
you are allowed to vary in order to achieve the minimum in addition to $P$
itself.
  Then if we say that we want to minimize $P$ according to this 
condition, then it's easy enough to justify the above formula.
First of all, $P$ and $Z$ must satisfy the condition $A$.
Secondly, there should not be any alternate $P'$ and $Z'$ that satisfy
$A$ with $P'$ less than $P$. ``Less 
than'' means $≤$ but not equal.
So this is the form of constrained minimization.

Actually, I've written a simplified form. 
There are some generalizations which are important in practice 
but which aren't conceptually different.
In particular, I've written it as though $P$ were a predicate of one argument,
but the same thing works if $P$ is a
predicate of several arguments. I've written it as if
there were only one $P$ that you wish to minimize, but
it can be handled in a similar way if there are several $P$'s that
you wish to minimize in parallel,
 even if they have different numbers of arguments.
I've written it as though there were only one thing 
$Z$ that you could vary besides $P$, but there could be many
of them. The only result of these generalizations is that
the formula's longer. Sparing you the longer formula
is all that I've accomplished by this. 

A fairly substantial
mathematical theory of this form of logical minimization is being
developed. The first remark to make is that circumscription yields
a formula of second order logic, because this ``for all''
here refers to predicates. If you study AI, you know that there are 
quite a few
theorem provers and problem solvers that manipulate first order
logic, but second order logic is considerably more difficult
to handle. However, it turns out that in many cases the
second order formula collapses to a first order formula, so 
you can apply your conventional methods after all. 
There is a theory -- thanks to
Vladimir Lifschitz, Michael Gelfond, and still others --- of the
collapsible cases of circumscription. 

The second point is that if you are
a mathematician and you see this engineer,(that is what the mathematicians
call me), talking about the minimum, then you get suspicious and you say,
``How do I know this minimum exists?'' And the answer is, ``Well it
doesn't always exist,'' and so there is a theory giving sufficient
conditions  under which you can guarantee that a minimum
exists and then some examples where these conditions are not met,
in which you can show that a minimum does not exist.
There is a possibility of developing a considerable
mathematical theory of logical minimization.
Ineed some logical problems, like 
generalized induction, that have been studied by logicians for some time, 
are examples of logical minimization.  In the theory 
of programming, also, one talks about least fixed points.
However, the AI applications present somewhat different mathematical problems
from the ones the logicians have already studied.


As an example, I'll apply circumscription
to Minsky's problem of birds flying.  Here are the formulas.

$$∀x.([¬ab(aspect1(x)) ⊃ ] ¬flies(x)).$$

$$∀x. [¬ab aspect1 x ⊃ ] ¬flies x.$$

$$[∀x.bird x ⊃ ab aspect1 x].$$

$$∀x. bird x [∧ ¬ab aspect2 x] ⊃ flies x.$$

$$[∀x. penguin x ⊃ ab aspect2 x].$$

$$∀x.penguin x [∧ ¬ab aspect3 x] ⊃ ¬flies x.$$

$$∀x.canary x [∧ ¬ab aspect4 x] ⊃ bird x.$$

The first two formulas, there, are in fact copies of one
another, the second one is the first one abbreviated in a way
I like to abbreviate things. 
Now I've been 
careful to put brackets [] around parts of the formulas,
and first, let's look at unbracketed part of the first formula.
If you look at this, it says ``for all X, not flies X''.
So, looked at in this way, it says that things don't fly.
Looked at with the bracketed part included it says, they don't fly unless
they are abnormal in aspect one --- symbolized by {\it aspect1}.
So, what you want to do is to write
down what is normally true but qualified by some 
abnormality.

	One of the early pitfalls in thinking
about this was to oversimplify
and write something like ``a normal bird flies.''
That's not quite right. The way you want to think about it is that
the bird that is normal in $aspect1$ flies, because if you want to say
that a normal bird has feathers, but for some reason you know
this bird doesn't fly, you still want to be able to conclude, unless
you have information to the contrary, that it has feathers.
If you simply say it's an abnormal bird, then you don't know anything about
it at all. So that is where these aspects come in. So this $ab$ stands
for abnormal in $aspect1$.

The next thing we say is that birds are abnormal in $aspect1$.
This cancels the effect of this rule as far as birds are concerned.
I call it a ``cancellation of inheritance'' axiom. It doesn't say that a bird
can't fly, but it certainly tells you that you can't use 
this rule to say that it can't.  The rule is entirely in brackets.

If you only look at the unbracketed part of the next formula, it says
birds fly. If you include the bracketed part, it says that a bird that is
not abnormal in aspect two flies. Then we can go on and say penguins are
abnormal in $aspect2$ and to say that a penguin that is not abnormal in
$aspect3$ doesn't fly, and so forth.  What you want to do in order to
apply the method of circumscription is to take the conjunction of all
these axioms and then
minimize $ab$. You want to circumscribe the predicate $ab$, so you want to
have as few things abnormal as possible.  That's the intent of this kind
of theory. It's called ``a simple abnormality theory'' and has proved to be
a moderately popular idea. Some people who don't like circumscription do
like ``ab''s and use abnormalities in some other non-monotonic formalisms.

The other thing that you have to decide before you can use 
circumscription is what's allowed to be varied. 
One thing that must be allowed to be varied in achieving the minimization
of abnormality is the predicate $flies$, because after all the whole 
point of the exercise is to make some decisions about what flies and
what doesn't fly. So that's to be varied. Now you get a nice result
here if you say that's all that varies. In other words, whether
something is a bird or not is supposed to be known and definite.
Whether something is a penguin is supposed to be known and definite.
If you do it that way, then you get the result that as far as these facts
are concerned --- in the standard model of these facts, the things
that fly are those birds that are not penguins.

You might regard that as too strong a conclusion.
In that case you could elaborate these axioms by introducing the notion
of what objects are present in a given situation.
If you know that some birds are present, and some of them are penguins,
then you can apply this axiom, and you will say the objects that fly
in this situation are the things, that are birds and not penguins.
The nice thing about this set of axioms is that you don't have
to change them to add additional facts. Suppose we want to say that
airplanes also fly. Then we have an axiom which is like this one, which
says that airplanes are abnormal in $aspect1$, and an axiom which is
like this one that says if something is an airplane and it's not abnormal
in $aspect7$, say, then it flies. 
I don't know whether you should reuse $aspect2$ or whether you need
a new one. I think it usually doesn't matter.
Conversely, if you want to say that ostriches are abnormal in $aspect2$,
and that ostriches in general don't fly, you can add that without any
problem. And this at least partially solved Minsky's problem, because
as fast as he thinks of these exceptions you can write them down 
without having to change anything that you already have.
Or as fast as exceptions occur, you can write them down.
Now actually this solution is only partially satisfactory.
The reason why it's somewhat satisfactory is if you are making
some kind of AI system, you may be able to work under conditions
in which it is definite as to what are the birds and what are the
penguins, and so forth and so on. But it is only partially satisfactory
because someone may also say, ``Well I don't know what are penguins and
what are birds in this situation?''  And if you do the circumscription on
these facts, making everything variable.

By the way, I'm not talking about the qualified axiom about canaries.
I'll leave that out for the time being. If you circumscribe these
facts all together, then you get an undesirable result.
Nothing is abnormal at all. Nothing flies.
There are no birds, there are no penguins, and in fact in some sense
these are no objects. So letting everything vary is too much unless
there are more axioms.

You could say that in a particular situation, we may have some
more facts, that Joe is a penguin and Tweety is known to be a bird.
And then you have to introduce some more abnormalities, to say it's abnormal
in some aspect to be a penguin. If you do that, you can still make this 
example work. But then you come to this axiom, which is to say that there
is some other sense of the word canary, so that not all canaries are birds.

In American slang, there is another sense of canary.
Canary in the 1930's in American movies anyway 
was a criminal who confessed.  ``He sang like a canary'',
was the saying at that time.
What you would like is to say, ``We will provide for that,
we will provide for a canary that it is not a bird,
but we'll just say those are abnormal in $aspect4$
and rely on circumscription to get rid of them.''
In the present formalism, that doesn't quite work,        
because if you know that Joe is a canary,
what you would like to have come out as a result of your 
non-monotonic reasoning is that, since you don't know anything else,
Joe is a bird and therefore flies.
What you will get when you do the circumscription are two possibilities;
one of them is that, and the other one is that Joe is not a canary
and does not fly and furthermore is not abnormal in $aspect1$.
So when you minimize $ab$, you get an unintended solution.
What to do about these unintended solutions is something
on which people have not yet come to an agreement.
There are a number of possibilities most involving
giving priorities to the normality of the different aspects.  However,
the canary problem is not one which people have given
most of their attention to. Most discussion has been in connection
 something called
the Yale Shooting problem, which I'll come to a little later.
Now I want to talk a little bit about actions.

\section{Actions and the situation calculus}

There are entities called situations
that are considered to be states of affairs.  We
have events, and we have a new situation that results               
from an event in an old situation. 
Thus the result of
an event in a situation is a new situation.
We write

$$s' = result(e,s).$$

The most common events that we'll be interested
in are actions, so we are interested in the new
situation that results from an action. We also have
fluents, which are kinds of things like functions --
they have values. We have the common sense law of 
inertia. The common sense law of inertia takes
the form that unless you know something particular about an
event and a fluent, then the event doesn't change this
fluent. So this says that the value 
of this fluent in the result of the event is the same as the
value it had before, unless there is something
abnormal about $aspect1$ of the fluent, the event
of the situation.
Now we specialize this to the blocks world.
In this, there are some kinds of objects, which
are blocks which I'll denote by $x$ as a variable;
there'll be some locations which I'll denote by $l$ as a
variable, and colors which I'll denote by $c$ as a
variable. We have some axioms concerning
two kinds of actions: one is moving
a block and another is painting a block.
This particular form of axiomatization is designed
to solve a problem which came up in previous
formalizations, and this problem is called the frame
problem.  The frame problem is that you need
to be able to say that moving a block doesn't
move other blocks, unless you have some positive
information that it does, and painting a block
doesn't move it, and moving it doesn't change its
color, and so forth. But you want to avoid
actually having to say these things explicitly,
because you can do better than that; you can do without it,
and in particular, humans do without it.

\section{The blocks world}

	The following set of ``situation calculus'' axioms solves the
frame problem for a blocks world in which blocks can be moved and painted.
Here $result(e,s)$ denotes the situation that results when event $e$ occurs
in situation $s$.
%leql{b1}
$$∀x e s.¬ab aspect1(x,e,s) ⊃ location(x,result(e,s))
 = location(x,s).\leql{bi}$$

%leql{b2}
$$∀x e s.¬ab aspect2(x,e,s) ⊃ color(x,result(e,s)) = color(x,s).\leql{bii}$$
%
Objects change their locations and colors only for a reason.

%leql{b3}
$$∀x l s.ab aspect1(x,move(x,l),s)\leql{biii}$$
%
and

	$$∀x l s.¬ab aspect3(x,l,s) ⊃ location(x,result(move(x,l),s)) = l.$$

%leql{b4}
$$∀x c s.ab aspect2(x,paint(x,c),s)\leql{biv}$$
%
and

	$$∀x c s.¬ab aspect4(x,c,s) ⊃ color(x,result(paint(x,c),s)) = c.$$
%
Objects change their locations when moved and their colors when painted.

%leql{b5}
$$∀x l s.¬clear(top x,s) ∨ ¬clear(l,s) ∨ tooheavy x ∨ l = top x$$
$$⊃ ab aspect3(x,move(x,l),s).\leql{bv}$$
%
This prevents the
rule (\eqref{biii}) from being used to infer that an object will move
if its top isn't clear or to a destination that isn't clear
or if the object is too heavy.  An object also cannot be moved to
its own top.

%leql{b6}
$$∀l s.clear(l,s) ≡ ¬∃x.(¬trivial x ∧ location(x,s) = l).\leql{bvi}$$
%
A location is clear if all the objects there are trivial, e.g. a speck of dust.

%leql{b7}
$$∀x.¬ab aspect7 x ⊃ ¬trivial x.\leql{bvii}$$
%
Trivial objects are abnormal in $aspect7$.


Then we can put in some things with preconditions
for this moving. For example, we can say
that the top of something should be clear; that is,
there should not be something located on top of this block.
And the place you want to move it should 
be clear. And there is also a restriction that prevents
this rule from letting you move an object to its
own top. If any of these things is true,
then it's abnormal in aspect two and you can't conclude
that it can be moved. However, the same
unintended model problem that arose with the canaries
arises again. This one was discovered by Vladimir
Lifschitz, and it occurs when we attempt to
calculate what's the value of location B in the
situation that results from moving B to the
top of C, in the situation that results from moving A
to the top of B, and is its location on top of C.
In the intended model, the answer is No,
or at least you can't conclude it, because
when you move A onto B, then B no longer has a               
clear top, and so therefore, our rules do not permit
us to conclude that a move from B onto the top of C is
successful. That's the intended model.
The unintended model arises in the following
way. We're minimizing ab in this whole
thing, and another minimum turns out to be
that for a totally unknown reason, this move
was unsuccessful, and therefore there is
nothing on the top of B in the second situation,
so B can be moved to the top of C. That's the
{\it unintended model}.

There are various ways of getting rid of
this unintended model. Vladimir Lifschitz
invented two of them. It is not clear
which is to be preferred. Which is to be
preferred is determined by which will lead to a more
general theory; that is, mere solving of this
particular problem is not sufficient. What
I want to show you is one of Lifschitz's solutions,
but applied to a simpler problem,
namely the Yale shooting problem.
This is a problem proposed by Drew McDermott
of Yale. These axioms certainly do not
describe the problem, so I'd better describe it.
The problem is again one of getting rid of an
unintended model. The actions are roughly
the following: You have somebody, I'll call him Fred,
who's alive in an initial situation $S_0$.
There is a gun. The gun is loaded. In this formalism,
you don't need to say by whom. And there is a
wait about which we have no positive information
concerning what effects a wait might have. And
then the gun is shot. And so we have the following
facts about the initial situation.
The proposition $alive$, which actually could have  Fred as a parameter
but the parameter is completely inactive, so you might as well 
leave it out, holds alive at $S_0$. Fred is alive at $S_0$. 
The gun is not loaded at $S_0$. We have the following general facts
about the effects of actions. One is that the loading causes the gun 
to be loaded, or causes the truth value of the gun being
loaded to be true. 
And shooting causes $loaded$ to be false, and shooting causes $alive$
to be false. But shooting has a precondition, and that is that the 
gun be loaded. These are the facts of the initial situation,
and these are the general facts about the effects of actions.
Then, this theory has some general axioms. That is, axioms 
which are independent of the problem of shooting Fred.

\section{Lifschitz's solution of the Yale shooting problem}

	The rest of this section is an excerpt from Lifschitz's paper.

Our axioms for the shooting problem can be classified into three groups. The first
group
describes the initial situation:
%
$$holds(alive,S0),\eqno(Y1.1)$$
$$¬ holds(loaded,S0).\eqno(Y1.2)$$
%
The second group tells us how the fluents are affected by actions:
%
$$causes(load,loaded,true),\eqno(Y2.1)$$
$$causes(shoot,loaded,false),\eqno(Y2.2)$$
$$causes(shoot,alive,false).\eqno(Y2.3)$$
%
These axioms describe the effects of 
{\sl successfully performed} actions;
they do not say {\sl when} an action can be successful.
This information is supplied separately:
%
$$precond(loaded,shoot).\eqno(Y2.4)$$
%
The last group consists of two axioms of a more general nature. We use
the abbreviations:
%
$$success(a,s) ≡ ∀f(precond(f,a) ⊃ holds(f,s)),$$
$$affects(a,f,s) ≡ success(a,s) ∧ ∃v\;causes(a,f,v).$$
%
One axiom describes how the value of a fluent changes after an action affecting
this fluent is carried out:
%
$$success(a,s) ∧ causes(a,f,v) ⊃ (holds(f,result(a,s)) ≡ v=true).\eqno(Y3.1)$$
%
(Recall that $v$ can take on two values here, $true$ and $false$; the
equivalence in $Y3.1$ reduces to $holds(f,result(a,s))$ in the first case and
to the negation of this formula in the second.)
If the fluent is not affected then its value remains the same:
%
$$¬affects(a,f,s) ⊃ (holds(f,result(a,s)) ≡ holds(f,s)).\eqno(Y3.2)$$

This axiom set obviously does not include a number of assumptions that may be
essential. The axioms do not
tell us, for instance, whether $false$ and $true$ are different from
each other and whether there are any truth values other than these two; we
do not know whether $load$, $shoot$ and $wait$ are three different actions, etc.
In this preliminary discussion, instead of making all these assumptions
explicit, we limit our attention to
the {\sl term models} of the axioms,
in which every element of the universe is
represented by a ground term of the language, and different terms represent
different objects. (Such models are also called {\sl Herbrand models} in case of
universal theories).
We will identify the universe of a term model with
the set of ground terms.

It remains to specify how circumscription is used. We circumscribe $causes$ and
$precond$, with the remaining predicate $holds$ allowed to vary, relative to the
conjunction $Y$ of the axioms $Y1.1$---$Y3.2$.
This will lead us to
the conclusion that $causes$ and $precond$ are true only when this is required by
the axioms of the second group, $Y2$.
The minimization of $causes$ will imply, in view of axiom
$Y3.2$, that the only changes taking place in the world are those corresponding to
axioms $Y2.1$---$Y2.3$. This solves the frame problem.
The minimization of $precond$,
in view of the definition of $success$, will imply that the only unsuccessful
actions are those in which precondition $Y2.4$ is violated.
This solves the qualification
problem.

To illustrate this last point, assume for a moment that we would like to make the
problem one step closer to the reality of assassination attempts as we know them in
our everyday life and take into account that one has to have bullets available in
order to load a gun. To introduce this qualification, we would simply
add the axiom
$$precond(bulletsavailable,load)$$
to group $Y2$, where $bulletsavailable$ is a new fluent. (We may also wish to
introduce an initial condition for this fluent and actions affecting its
value, such as $stealbullets$, $losebullets$, etc.)

\section{Comments on Lifschitz's solution}

Now a little bit about this 
predicate $causes$. If you were a philosopher you would say,
``How does he define causes? We philosophers have argued about 
that for more than 2000 years, and which of us does he believe as to 
what the definition of causality is ?'' And the answer is ``none of 
the above;'' namely, we are introducing $causes$ in this formalism
as an undefined term, and mainly it corresponds to the current human
usage, or even less likely, corresponds to a current philosophical
usage. Maybe it doesn't. But the hope is that it is going to be useful
for AI purposes. 
So we have these two definitions, and then we have an axiom about 
the effects of success, which is really analogous to the
previous axioms that I had about if the action of moving a block 
to a location L was successful, then it is in the location of L.
This says success of A and S and ``causes A'' (A causes the fluent 
to have a certain value) implies that the fluent has that value. 
Actually there are specialized fluents whose values are true and false,
although later in the same paper Lifschitz generalizes it. But since I
want to get all the formulas of Lifschitz's things on one sheet of 
paper, or on the slide, I did this one rather than the other.
The other one is the block's one.
The last one is the common sense law of inertia, which says that 
if an action doesn't affect the fluent, then the fluent holds 
in the resulting situation if and only if it held before. So, it 
has the same value as it had before. You can take my word, or rather 
take Lifschitz's word for the fact that he has proved that if you use
these axioms, and you circumscribe the two predicates, $causes$ and
$precond$ (that's why I made them red; so they'd play the role of ab)
in the axioms, then you will only get one model. You will get the           
model that in the situation we are talking about.
We are talking about $result$ of $shoot$ and result of $wait$ and          
result of $load$ at $S_0$, and maybe we will even say ``not holds
alive'' in this situation. So that's what Lifschitz can prove;	
if you minimize $causes$ and $precond$, then you will be able to   
prove this thing, and the unintended model that you will get          
from the simple abnormality theory doesn't occur. Now maybe that's
all the formulas that I want to discuss. That was perhaps too many
anyway.

Let me again say a little bit about the state of formalizing common sense.
In spite of the fact that it isn't yet decided which are the best 
ways of getting rid of this kind of unintended model, I believe
this problem will be solved fairly satisfactorily in the next few 
years. There are a large number of proposals for solving it. Then, 
let me say just a little bit about some of them. One of McDermot's
students, Shoham, has a notion that he calls ``chronological 
minimization'', which can be expressed by a logical formula. It 
says that if something is going to be abnormal, if we have a 
choice as to what's going to be abnormal, we would like the 
abnormality to occur as late as possible. That's good enough for a           
number of problems, especially for predicting the effects of 
actions, but what it comes from is saying we know about the future only 
from our reasoning about the events that occur in the present, and 
if we want to know what occurs two days from now, then basically
we know about it because we reason about what happens tomorrow,
and from that we reason about what will be the case two days from now.
So this is his chronological minimization. It will work for a lot of 
AI problems. It has some limitations. The limitations perhaps arise
if you want to reason about the past from the present.
Then what you know is about the present, and you want to reason about
the day before yesterday via yesterday. The situation isn't entirely
analogous with these reasonings although I've put it in a parallel form.
What it suggests is that, instead of trying to do what I call a 
fully objective theory, in which we make our minimizations concerned
with what's in the world, we attempt a more subjective theory
in which you have the following character. If we want to know 
about $Z$, and all we know about $Z$ is derived from what we found out 
about $Y$, and all we know about $Y$ is what we found out about $X$,
then we should prefer abnormalities in $Z$ to abnormalities in $Y$.
That's a sort of another approach to the problem
but because that approach involves to some extent formalizing
not merely facts about the world, but facts about knowledge, it's
more difficult than the other.

	I hope I have given you some idea
of a few of the goals and the present state of the efforts to give 
computer programs common sense.
\smallskip\centerline{Copyright \copyright\ \number\year\ by John McCarthy}
\smallskip\noindent{This draft of tohoku[s87,jmc]\ TEXed on \jmcdate\ at \theTime}
\vfill\eject\end